{
  "nbformat": 4,
  "nbformat_minor": 2,
  "metadata": {
    "language_info": {
      "name": "python",
      "codemirror_mode": {
        "name": "ipython",
        "version": 2
      },
      "version": "2.7.12"
    },
    "orig_nbformat": 2,
    "file_extension": ".py",
    "mimetype": "text/x-python",
    "name": "python",
    "npconvert_exporter": "python",
    "pygments_lexer": "ipython2",
    "version": 2
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": { },
      "outputs": [ ],
      "source": [
        "# SPTAG python wrapper tutorial \n",
        "\n",
        "To end-to-end build a vector search online service, it contains two steps:\n",
        "- Offline build SPTAG index for database vectors\n",
        "- Online serve the index to support vector search requests from the clients\n",
        "\n",
        "## Offline build SPTAG index\n",
        "\n",
        "> Prepare input vectors and metadatas for SPTAG"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": { },
      "outputs": [ ],
      "source": [
        "import os\n",
        "import numpy as np\n",
        "\n",
        "vector_number = 100\n",
        "vector_dimension = 10\n",
        "\n",
        "# Randomly generate the database vectors. Currently SPTAG only support int8, int16 and float32 data type.\n",
        "x = np.random.rand(vector_number, vector_dimension).astype(np.float32) \n",
        "\n",
        "# Prepare metadata for each vectors, separate them by '\\n'. Currently SPTAG python wrapper only support '\\n' as the separator\n",
        "m = ''\n",
        "for i in range(vector_number):\n",
        "    m += str(i) + '\\n'"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": { },
      "outputs": [ ],
      "source": [
        "> Build SPTAG index for database vectors **x**"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": { },
      "outputs": [
        {
          "data": {
            "text/plain": "['deletes.bin',\n 'graph.bin',\n 'indexloader.ini',\n 'metadata.bin',\n 'metadataIndex.bin',\n 'tree.bin',\n 'vectors.bin']"
          },
          "execution_count": 2,
          "metadata": { },
          "output_type": "execute_result"
        }
      ],
      "source": [
        "import SPTAG\n",
        "\n",
        "index = SPTAG.AnnIndex('BKT', 'Float', vector_dimension)\n",
        "\n",
        "# Set the thread number to speed up the build procedure in parallel \n",
        "index.SetBuildParam(\"NumberOfThreads\", '4')\n",
        "\n",
        "# Set the distance type. Currently SPTAG only support Cosine and L2 distances. Here Cosine distance is not the Cosine similarity. The smaller Cosine distance it is, the better.\n",
        "index.SetBuildParam(\"DistCalcMethod\", 'Cosine') \n",
        "\n",
        "if index.BuildWithMetaData(x, m, vector_number, False):\n",
        "    index.Save(\"sptag_index\") # Save the index to the disk\n",
        "\n",
        "os.listdir('sptag_index')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": { },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "[47, 10, 92]\n[-1.0077600479125977, -0.9659097194671631, -0.9412962198257446]\n['47\\n', '10\\n', '92\\n']\n"
        }
      ],
      "source": [
        "# Local index test on the vector search\n",
        "index = SPTAG.AnnIndex.Load('sptag_index')\n",
        "\n",
        "# prepare query vector\n",
        "q = np.random.rand(vector_dimension).astype(np.float32)\n",
        "\n",
        "result = index.SearchWithMetaData(q, 3) # Search k=3 nearest vectors for query vector q\n",
        "print (result[0]) # nearest k vector ids\n",
        "print (result[1]) # nearest k vector distances\n",
        "print (result[2]) # nearest k vector metadatas"
      ]
    },
    {
      "cell_type": "markdown",
      "execution_count": null,
      "metadata": { },
      "outputs": [ ],
      "source": [
        "## Online serve the index\n",
        "\n",
        "Start the vector search service on the host machine which listens for the client requests on the port 8000\n",
        "\n",
        "> Write a server configuration file **service.ini** as follows:\n",
        "\n",
        "```bash\n",
        "[Service]\n",
        "ListenAddr=0.0.0.0\n",
        "ListenPort=8000\n",
        "ThreadNumber=8\n",
        "SocketThreadNumber=8\n",
        "\n",
        "[QueryConfig]\n",
        "DefaultMaxResultNumber=6\n",
        "DefaultSeparator=|\n",
        "\n",
        "[Index]\n",
        "List=MyIndex\n",
        "\n",
        "[Index_MyIndex]\n",
        "IndexFolder=sptag_index\n",
        "```\n",
        "\n",
        "> Start the server on the host machine\n",
        "\n",
        "```bash\n",
        "Server.exe -m socket -c service.ini\n",
        "```\n",
        "\n",
        "It will print the follow messages:\n",
        "\n",
        "```bash\n",
        "Setting TreeFilePath with value tree.bin\n",
        "Setting GraphFilePath with value graph.bin\n",
        "Setting VectorFilePath with value vectors.bin\n",
        "Setting DeleteVectorFilePath with value deletes.bin\n",
        "Setting BKTNumber with value 1\n",
        "Setting BKTKmeansK with value 32\n",
        "Setting BKTLeafSize with value 8\n",
        "Setting Samples with value 1000\n",
        "Setting TPTNumber with value 32\n",
        "Setting TPTLeafSize with value 2000\n",
        "Setting NumTopDimensionTpTreeSplit with value 5\n",
        "Setting NeighborhoodSize with value 32\n",
        "Setting GraphNeighborhoodScale with value 2\n",
        "Setting GraphCEFScale with value 2\n",
        "Setting RefineIterations with value 2\n",
        "Setting CEF with value 1000\n",
        "Setting MaxCheckForRefineGraph with value 8192\n",
        "Setting NumberOfThreads with value 4\n",
        "Setting DistCalcMethod with value Cosine\n",
        "Setting DeletePercentageForRefine with value 0.400000\n",
        "Setting AddCountForRebuild with value 1000\n",
        "Setting MaxCheck with value 8192\n",
        "Setting ThresholdOfNumberOfContinuousNoBetterPropagation with value 3\n",
        "Setting NumberOfInitialDynamicPivots with value 50\n",
        "Setting NumberOfOtherDynamicPivots with value 4\n",
        "Load Vector From sptag_index\\vectors.bin\n",
        "Load Vector (100, 10) Finish!\n",
        "Load BKT From sptag_index\\tree.bin\n",
        "Load BKT (1,101) Finish!\n",
        "Load Graph From sptag_index\\graph.bin\n",
        "Load Graph (100, 32) Finish!\n",
        "Load DeleteID From sptag_index\\deletes.bin\n",
        "Load DeleteID (100, 1) Finish!\n",
        "Start to listen 0.0.0.0:8000 ...\n",
        "```\n",
        "\n",
        "> Start python client to connect to the server and send vector search request."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": { },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": "[15, 35, 75]\n[-1.094923973083496, -1.0913333892822266, -1.0795197486877441]\n['15\\n', '35\\n', '75\\n']\n"
        }
      ],
      "source": [
        "import SPTAGClient\n",
        "import time\n",
        "\n",
        "# connect to the server\n",
        "client = SPTAGClient.AnnClient('127.0.0.1', '8000')\n",
        "while not client.IsConnected():\n",
        "    time.sleep(1)\n",
        "client.SetTimeoutMilliseconds(18000)\n",
        "\n",
        "k = 3\n",
        "vector_dimension = 10\n",
        "# prepare query vector\n",
        "q = np.random.rand(vector_dimension).astype(np.float32)\n",
        "\n",
        "result = client.Search(q, k, 'Float', True) # AnnClient.Search(query_vector, knn, data_type, with_metadata)\n",
        "\n",
        "print (result[0]) # nearest k vector ids\n",
        "print (result[1]) # nearest k vector distances\n",
        "print (result[2]) # nearest k vector metadatas\n",
        "\n"
      ]
    }
  ]
}
